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Abstract. In this paper, we extend the notion of entropy in a natural manner for a mixed-pair random 
variable, a pair of random variables with one discrete and the other continuous. Our extensions are consistent 
in that there exist natural injections from discrete or continuous random variables into mixed-pair random 
variables such that their entropy remains the same. This extension of entropy allows us to obtain sufficient 
conditions for the entropy preservation under bijections between mixed-pair random variables. 
CM The extended definition of entropy leads to an entropy rate for continuous time Markov chains. As 

applications of our results, we provide simpler proofs of some known probabilistic results. The frame-work 
developed in this paper is best suited for establishing probabilistic properties of complex processes, such as 
load balancing systems, queuing networks, caching algorithms, that have inherent discrete variables (choices 
made) and continuous variables (occurrence times). 
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1. Introduction 



The notion of entropy for discrete random variables as well as continuous random variables is well defined. 
c/2 , Entropy preservation of discrete random variable under bijection map is an extremely useful property. For 

example, Prabhakar and Gallager [PG03| used this entropy preservation property to obtain an alternate 
proof of the known result that Geometric processes are fixed points under certain queuing disciplines. 
CM ' In many interesting situations, including Example 11.11 given below, the underlying random variables are 

mixtures of discrete and continuous random variables. Such systems exhibit natural bijective properties 
which allow one to obtain non-trivial properties of the system via non-rigorous " information preservation" 
arguments. In this paper we develop sufficient conditions to make such arguments rigorous. 
■ We will extend the definition of entropy to random variables that form a mixed pair of discrete and 

continuous variables as well as obtain sufficient conditions for preservation of entropy. Subsequently, we will 
^^O ■ provide a rigorous justification of mathematical identities that follow in the example below. 



Example 1.1. Poisson Splitting: Consider a Poisson Process, V, of rate A. Split the Poisson process into 
two baby-processes V\ and Vi as follows: for each point of V, toss an independent coin of bias p; if coin 
turns up heads then the point is assigned to Vi, else to Vi. It is well-known that V\ and V2 are independent 
1 Poisson processes with rates Xp and A(l — p) respectively. 

Entropy rate of a Poisson process with rate fi is known to be fi(l — log /x) nats per second. That is, entropy 
rates of V, Pi, and V2 are given by A(l — log A), Ap(l — logAp) and A(l — p)(l — logA(l — p)) respectively. 
Further observe that the coin of bias p is tossed at a rate A and each coin-toss has an entropy equal to 
— p\ogp — (1 — p) log(l — p) nats. 

It is clear that there is a bijection between the tuple (V, coin-toss process) and the tuple (Pi,^)- Observe 
that the joint entropy rate of the two independent baby-processes are given by their sum. This leads to the 
following "obvious" set of equalities. 

H E r(Vi,V 2 ) = HEniPj) + H ER (T 2 ) 

= Ap(l-logAp) + A(l-p)(l-logA(l-p)) 
= A(l - log A) + X(- P \ogp - (1 -p) log(l - p)) 
= H ER {V) + \{-p\ogp - (1 -p) log(l - p)). 

The last sum can be identified as sum of the entropy rate of the original Poisson process and the entropy 
rate of the coin tosses. However the presence of differential entropy as well as discrete entropy prevents 



this interpretation from being rigorous. In this paper, we shall provide rigorous justification to the above 
equalities. 



2. Definitions and Setup 



This section provides technical definitions and sets up the frame-work for this paper. First, we present 
some preliminaries. 

2.1. Preliminaries. Consider a measure space (f2, J-, P), with P being a probability measure. Let (R, Br) 
denote the measurable space on R with the Borel er-algcbra. A random variable X is a measurable mapping 
from n to R. Let fix denote the induced probability measure on (R, Br) by A. We call A as discrete 
random variable if there is a countable subset {x\, X2, • ■■} of R that forms a support for the measure fix- Let 
Pi = P(A = Xi) and note that YliPi = 1- 

The entropy of a discrete random variable is defined by the sum 



Note that this entropy is non-negative and has several well known properties. One natural interpretation 
of this number is in terms of the maximum compressibility (in bits per symbol) of an i.i.d. sequence of the 
random variables, X (cf. Shannon's data compression theorem |Sha48j ). 

A random variable Y, defined on (O,^ 7 , P), is said to be a continuous random variable if the probability 
measure, fly, induced on (R, Br) is absolutely continuous with respect to the Lebesgue measure. These 
probability measures can be characterized by a non- negative density function f(x) that satisfies J" R f(x)dx 
1 . The entropy (differential entropy) of a continuous random variable is defined by the integral 



The entropy of a continuous random variable is not non-negative, though it satisfies several of the other 
properties of the discrete entropy function. Due to negativity, differential entropy clearly does not have 
interpretation of maximal compressibility. However, it does have the interpretation of being the limiting 
difference between the maximally compressed quantization of the random variable and an identical quanti- 



zation of an independent U[0, ljj random variable [CT91j as the quantization resolution goes to zero. Hence 
the term differential entropy is usually preferred to entropy when describing this number. 

2.2. Our Setup. In this paper, we are interested in a set of random variables that incorporate the aspects 
of both discrete and continuous random variables. Let Z = (A, Y) be a measurable mapping from the space 
(f2,JT, P) to the space (R x R, Br x Br). Observe that this mapping induces a probability measure fiz on 
the space (R x R, Br x Br) as well as two probability measures fix and fly on (R, Br) obtained via the 
projection of the measure fiz- 

Definition 2.1 (Mixed-Pair). Consider a random variables Z = (A, Y). We call .zQ a mixed-pair if A is 
a discrete random variable while Y is a continuous random variable. That is, the support of fiz is on the 
product space § x R, with § = {x\,x-i, ...} is a countable subset of R. That is § forms a support for fix 
while \iy is absolutely continuous with respect to the Lebesgue measure. 

Observe that Z = (A, Y) induces measures {fii, fi2, ■■■■} that are absolutely continuous with respect to 
the Lebesgue measure, where fii(A) = P(A = Xi, Y £ A), for every A € Br. Associated with these measures 
Hi, there are non- negative density functions gi(y) that satisfy 



* U[0,1] represents a random variable that is uniformly distributed on the interval [0,1] 

tFor the rest of the paper we shall adopt the notation that random variables Xi represent discrete random variables, Yj 
represent continuous random variables and Z; represent mixed-pair of random variables. 
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Let us define pt = J R gi(y)dy. Observe that p^s are non-negative numbers that satisfy Y^iPi = 1 an d 
corresponds to the probability measure fj-x- Further g(y) = JV gi(y) corresponds to the probability measure 
[iy . Let 

9i(y) = —9t(y) 

Pi 

be the probability density function of Y conditioned on X = Xi . 

The following non-negative sequence is well defined for every y £ R for which g(y) > 0, 

/ \ 9i(y) ■ ^ -, 
Pi(y) = -TT' 1 - L 

Now <;(?/) is finite except possibly on a set, A, of measure zero. For y £ A c , we have that '^2 i Pi{y) — 1; Pi(y) 
corresponds to the probability that X = Xi conditioned on Y = y. It follows from definitions of pi and Pi(y) 
that 



Pi = J Pt(y)g(y)dy. 

Definition 2.2 (Good Mixed-Pair ). A mixed-pair random variable Z = (X, Y) is called good if the following 
condition is satisfied: 

(2-1) V / \gi{y) log gi (y)\dy < oo. 

i •« 

Essentially, the good mixed-pair random variables possess the property that when restricted to any of the 
X values, the conditional differential entropy of Y is well-defined. The following lemma provides a simple 
sufficient conditions for ensuring that a mixed-pair variable is good. 

Lemma 2.1. The following conditions are sufficient for a mixed-pair random variable to be a good pair: 
(a) Random variable Y possess a finite e th moment for some e > 0, i.e. 



M t = J \y\ t g{y)dy < oo. 

(b) There exists S > such that g(y) satisfies 

g(y) 1+5 dy < oo. 

(c) The discrete random variable X has finite entropy, i.e. —'^2 i pi\ogpi < oo. 

Proof. The proof is presented in the appendix. □ 

Definition 2.3 (Entropy of a mixed-pair). The entropy of a good mixed-pair random variable is defined by 

(2-2) M(Z) = - £ / M log 9i(y)dy. 

i « 

Definition 2.4 (Vector of Mixed-Pairs). Consider a random vector (Z\, Z&) = {{X\, Yi), (Xj, Yd)}. 
We call (Zi,...,Zd) a vector of mixed-pairs if the support of fi(z lt ...,z d ) is on the product space S d x R d , 
where S d C M. d is a countable set. That is, S d forms the support for the probability measure [i(x 1 ,...x d ) while 
the measure ^(Y 1 ....Y d ) is absolutely continuous with respect to the Lebesgue measure on M. d . 

Definition 2.5 (Good Mixed-Pair Vector ). A vector of mixed-pair random variables {Z%, Zj) is called 
good if the following condition is satisfied: 

(2-3) V / |ffx(y)logg x (y)My < oo, 



xes d ' 

where <? x (y) is the density of the continuous random vector Y d conditioned on the event that X = x. 



Analogous to Lcmma l2.ll the following conditions guarantee that a vector of mixed-pair random variables 
is good. 

Lemma 2.2. The following conditions are sufficient for a mixed-pair random variable to be a good pair: 
(a) Random variable Y d possess a finite e th moment for some e > 0, i.e. 

M e = [ \\y\\ e g(y)dy<w. 

(Jo) There exists 5 > such that g(y) satisfies 

I g(y) 1+s dy < oo. 

(c) The discrete random variable X d has finite entropy, i.e. — X)xes d Pxlogp x < oo. 

Proof. The proof is similar to that of Lemma |2. II and is omitted. □ 

In rest of the paper, all mixed-pair variables and vectors are assumed to be good, i.e. assumed to satisfy 
the condition (|2.ip . 

Definition 2.6 (Entropy of a mixed-pair vector). The entropy of a good mixed-pair vector of random 
variables is defined by 

(2.4) H(iT) = -V / .9x(y)log 5x (y)dy. 

Definition 2.7 (Conditional entropy). Given a pair of random variables (Zi,Z 2 ), the conditional entropy 
is defined as follows 

W(Z X \Z 2 ) = U(Z U Z 2 )-W(Z 2 ). 
It is not hard to see that M(Zi\Z 2 ) evaluates to 

/ fc,^ (2/1,2/2) log -, — r — dyidy 2 . 

xi,X2 2 9xAV2) 

Definition 2.8 (Mutual Information). Given a pair of random variables (Z\,Z-i), the mutual information 
is defined as follows 

I(Zi; Z 2 ) = H(Zi) + H(Z a ) - H(Zi, Z 2 ). 
The mutual information evaluates to 

/ i \ l 9xi,x 2 {yi, V2) , , 

/ 3a:i,x a (2/i,2/2)log ——dy 1 dy 2 . 

xT^J^ 9x 1 {yi)gx 2 (y2) 

Using the fact that 1 + logx < x for x > it can be shown that I(Z±; Z 2 ) is non-negative. 

2.3. Old Definitions Still Work. We will now present injections from the space of discrete (or continuous) 
random variables into the space of mixed-pair random variable so that the entropy of the mixed-pair random 
variable is the same as the discrete (or continuous) entropy. 

Injection: Discrete into Mixed-Pair. Let X be a discrete random variable with finite entropy. Let {p\,p 2 , . . .} 
denote the probability measure associated with X . Consider the mapping er^ : X — > Z = (X, U) where U 
is an independent continuous random variable distributed uniformly on the interval [0,1]. For Z, we have 
9i(y) =Pi f° r y 6 [°) !]• Therefore 

= - V / g l (y) log gi(y) dy = V / -p t logpidy 
i J® i Jo 

=~y^Pi i°gpi = h(x) < 00. 

i 
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Therefore we see that M(Z) = H{X). 



Injection: Continuous into Mixed-Pair. Let Y be a continuous random variable with a density function g(y) 
that satisfies 

g(y)\i°gg(y)\ dy < oo. 

Consider the mapping a c : Y — > Z = (Xq, Y) where Xq is the constant random variable, say F(Xq = 1) = 1. 
Observe that g(y) = gi(y) and that the pair Z = (Xq, Y) is a good mixed-pair that satisfies M(Z) = h(Y). 

Thus (7d and o~ c are injections from the space of continuous and discrete random variables into the space 
of good mixed-pairs that preserve the entropy function. 

2.4. Discrete-Continuous Variable as Mixed-Pair. Consider a random variabl(Q V whose support is 
combination of both discrete and continuous. That is, it satisfies the following properties: (i) There is a 
countable set (possibly finite) S = {x\, X2, ■■■} such that jj,y(xi) = pi > 0; (ii) measure jly with an associated 
non-negative function g(y) (absolutely continuous w.r.t. the Lebesgue measure), and (iii) the following holds: 

/ g(y) dy + y^Pi = 1. 
Jr 

Thus, the random variable V either takes discrete values x±,X2, - ■ ■ with probabilities pi,P2,- ■■ or else it 
is distributed according to the density function j^g(y)', where p = YliPi- Observe that V has neither a 
countable support nor is its measure absolutely continuous with respect to Lebesgue measure. Therefore, 
though such random variables are encountered neither the discrete entropy nor the continuous entropy is 
appropriate. 

To overcome this difficulty, we will treat such variables as mixed-pair variables by appropriate injection 
of such variables into mixed-pair variables. Subsequently, we will be able to use the definition of entropy for 
mixed-pair variables. 

Injection: Discrete- Continuous into Mixed-Pair. Let V be a discrete-continuous variable as considered above. 
Let the following two conditions be satisfied: 

-V" Pi log Pi < oo and / g(y)\\ogg(y)\ dy < oo. 
i Js - 

Consider the mapping a m : V — > Z = (X, Y) described as follows: When V takes a discrete value Xi, it is 
mapped on to the pair (xj, ui) where Ui is chosen independently and uniformly at random in [0, 1]. When V 
does not take a discrete value and say takes value y, it gets mapped to the pair (xo, y) where xq ^ Xi,\/i. One 
can think of Xq as an indicator value that V takes when it is not discrete. The mixed-pair variable Z has its 
associated functions {go (y) , gi (y) , ...} where g%(y) = Pi, y S [0, > 1 and go(y) = g(y)- The entropy of Z 
as defined earlier is 

H(Z) = -V f gi (y)\og gi (y) dy 

= - y^Pi lo gPi - / g(y) log <?(y) dy. 

Jr 

Remark 2.1. In the rest of the paper we will treat every random variable that is encountered as a mixed-pair 
random variable. That is, a discrete variable or a continuous variable would be assumed to be injected into 
the space of mixed-pairs using the map or a Cl respectively. 

3. BlJECTIONS AND ENTROPY PRESERVATION 

In this section we will consider bijections between mixed-pair random variables and establish sufficient 
conditions under which the entropy is preserved. We first consider the case of mixed-pair random variables 
and then extend this to vectors of mixed-pair random variables. 



'Normally such random variables are referred to as mixed random variables. 
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3.1. Bijections between Mixed-Pairs. Consider mixed-pair random variables Z\ = {X\,Y\) and Z 2 = 
(X 2 ,Y 2 ). Specifically, let §1 = {xu} and S 2 = {x 2 j} be the countable (possibly finite) supports of the 
discrete measures \ix x and ^x 2 such that /i^i (xu) > and nx 2 ( x 2j) > for all i G Si and j G §2- Therefore 
a bijection between mixed-pair variables Z\ and Z2 can be viewed as bijections between Si x R and §2 x 

Let .F : Si x R — > §2 x R be a bijection. Given Z\, this bijection induces a mixed-pair random variable Z2. 
We restrict our attention to the case when F is continuous and diffcrentiabl({§. Let the induced projections be 
Fd : §1 xR — > S2 and F c : §1 xtt — > R. Let the associated projections of the inverse map F" 1 : §2 xR — > Si xR 
be F' 1 : S 2 x R -> §1 and f^iSjxl^I respectively. 

As before, let {5i(2/i)}, {^(2/2)} denote the non-negative density functions associated with the mixed- 
pair random variables Z\ and Z 2 respectively. Let {x<ij,y-i) = F(xu,yi), i.e. x 2 j = i*d(aiij, 3/1) and y 2 = 
F c (xu,yi). Now, consider a small neighborhood Xu x [yi,yi + dyi) of (xu,yi). From the continuity of 
F, for small enough dyi, the neighborhood xu x [2/1,2/1 + dyi) is mapped to some small neighborhood of 
(x 2j ,y 2 ), say x 2j x [2/2,2/2 + dy 2 ). The measure of 2^ x [2/1,2/1 + dyi) is w Si (2/1) | dyi |, while measure of 
.T2j x [2/2, 2/2 + dy 2 ) is « hj(y 2 )\dy 2 \. Since distribution of Z2 is induced by the bijection from Z\, we obtain 

dyi 



(3.1) 5,(2/1) 
Further from y 2 = F c (xu,y\) we also have, 
(3.2) 



dyi 



hj(y 2 ). 



dy 2 
dyi 



dF c {xu,yi) 
dyi 



These immediately imply a sufficient condition under which bijections between mixed-pair random variables 
imply that their entropies are preserved. 



Lemma 3.1. If 



dF a (xii,yi) 
dyi 



= 1 for all points (x u ,yi) E Si x R, then H(Zi) = M(Z 2 ). 



Proof. This essentially follows from the change of variables and repeated use of Fubini's theorem (to in- 
terchange the sums and the integral). To apply Fubini's theorem, we use the assumption that mixed-pair 
random variables are good. Observe that, 



H(Zi) 



(3.3) 



0) 



CO 



E 



j 

E 
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9i{yi)\ogg l {yi)dyi 



hi (2/2) log [hj{V2 
hj(V2) log hj(y 2 )dy 2 



dF c (xu,yi) 



dyi 



dy 2 



= H(Z a ). 

Here (a) is obtained by repeated use of Fubini's theorem along with (J3TTJ) and (b) follows from the assumption 
of the Lemma that I dF ^^ | = l. □ 

dyi 1 

3.2. Some Examples. In this section, we present some examples to illustrate our definitions, setup and 
the entropy preservation Lemma. 

Example 3.1. Let Y\ be a continuous random variable that is uniformly distributed in the interval [0,2]. 
Let X 2 be the discrete random variable that takes value when Y\ G [0, 1] and 1 otherwise. Let Y 2 = Y\—X 2 . 
Clearly Y 2 G [0, 1], is uniformly distributed and independent of X 2 . 

Let Z\ = (Xi,Yi) be the natural injection, a c of Y± (i.e. X\ is just the constant random variable.). 
Observe that the bijection between Z\ to the pair Z 2 = (X 2 ,Y 2 ) that satisfies conditions of Lemma l3~Tl and 
implies 

log2 = H(Zi) = H(Z 2 ). 



§ The continuity of mapping between two copies of product space §xl essentially means that the mapping is continuous with 
respect to right (or Y) co-ordinate for fixed Xi £ S. Similarly, differentiability essentially means differentiability with respect to 
Y co-ordinate. 
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However, also observe that by plugging in the various definitions of entropy in the appropriate spaces, 
13(^2) = M(X2, Y2) = H(X2) + h{Y2) — log 2 + 0, where the first term is the discrete entropy and the second 
term is the continuous entropy. In general it is not difficult to see that the two definitions of entropy (for 
discrete and continuous random variables) are compatible with each other if the random variables themselves 
are thought of as a mixed-pair. 

Example 3.2. This example demonstrates that some care must be taken when considering discrete and 

continuous variables as mixed-pair random variables. Consider the following continuous random variable Y\ 

that is uniformly distributed in the interval [0,2]. Now, consider the mixed random variable V2 that takes 

the value 2 with probability ^ and takes a value uniformly distributed in the interval [0, 1] with probability 
\ 

2 ■ 

Clearly, there is a mapping that allows us to create V2 from Y\ by just mapping Y\ £ (1,2] to the value 
V2 — 2 and by setting Y\ = V2 when Y\ £ [0, 1]. However, given V2 = 2 we are not able to reconstruct Y\ 
exactly. Therefore, intuitively one expects that H(Yi) > M(V^). 

However, if you use the respective injections, say Y\ — > Z\ and V2 — > Z2, to the space of mixed-pairs of 
random variables, we can see that 

H(Yi) = H(Zi) = log 2 = M(Z 2 ). 

This shows that if we think of M(Z2) as the entropy of the mixed random variable V2 we get an intuitively 
paradoxical result where H(Yi) = H(V2) where in reality one would expect H(Y"i) > H(V2). 

The careful reader will be quick to point out that the injection from V2 to Z2 introduces a new continuous 
variable, Y22J associated with the discrete value of 2, as well as a discrete value Xq associated with the 
continuous part of V2. Indeed the "new" random variable Y22 allows us to precisely reconstruct Y\ from Z 2 
and thus complete the inverse mapping of the bijection. 

Remark 3.1. The examples show that when one has mappings involving various types of random variables 
and one wishes to use bijections to compare their entropies; one can perform this comparison as long as the 
random variables are thought of as mixed-pairs. 

3.3. Vector of Mixed-Pair Random Variables. Now, we derive sufficient conditions for entropy preser- 
vation under bijection between vectors of mixed-pair variables. To this end, let Z\ = (Z\,...,Z\) and 
Z 2 = (Z 2 , . . . , Z%) be two vectors of mixed-pair random variables with their support on Si x R d and §2 x K d 
respectively. (Here 81,82 are countable subsets of Let F : 81 x M. d — ► 82 x M. d be a continuous and 
diffcrentiablc bijection that induces Z2 by its application on Z\. 

As before, let the projections of F be Fd : §1 x R d — > 82 and F c : Si x R d — > R d . We consider situation 
where F c is differentiable. Let <?,(y),y £ K d for Xj £ Si and hj (y),y £ M. d for Wj £ 82 be density functions 
as defined before. Let (x^y 1 ) e §1 x K d be mapped to (w^, y 2 ) £ 82 x R d . Then, consider d x d Jacobian 

-dyf 



l<k,l<d 

where wc have used notation y 1 = (y\, . . . , y d ) and y 2 = (y 2 , . . . , y^). Now, similar to Lemma |3. II we obtain 
the following entropy preservation for bijection between vector of mixed-pair random variables. 

Lemma 3.2. If for all (a*, y 1 ) £ Si x R d , 

|det(J(a; l , 2 / 1 ))| = 1, 
then M(Z 1 ) = M(Z 2 ). Here det(J) denotes the determinant of matrix J. 

Proof. The main ingredients for the proof of Lemma 13 . 1 1 for the scalar case were the equalities (|3.ip and p. 21) . 
For a vector of mixed-pair variable we will obtain the following equivalent equalities: For change of dy 1 at 
(x^y 1 ), let dy 2 be induced change at (xj,y 2 ). Let vol((iy) denote the volume of d dimensional rectangular 
region with sides given by components of dy in M. d . Then, 

(3.4) .g^yVKdy 1 ) = / lj (y 2 )vol(dy 2 )- 
Further, at (x^y 1 ), 

(3.5) vol(dy 2 ) = |det(J(x J , 2/ 1 ))|vol(rfy 1 )- 
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Using exactly the same argument that is used in f|3. 3|) (replacing dyk by vol(dy fc ), k = 1,2), we obtain the 
desired result. This completes the proof of Lemma [372] □ 



Remark 3.2. Essentially, for every discrete choice X, if the mapping between the continuous vectors has one, 
then the bijection preserves entropy. 



4. Entropy Rate of Continuous Time Markov Chains 

A continuous time Markov chain is composed of the point process that characterizes the time of transitions 
of the states as well as the discrete states between which the transition happens. Specifically, let i, G R 
denote the time of i th transition or jump with i £ Z. Let Vi £ § denote the state of the Markov chain after 
the jump at time Xi, where S be some countable state space. For simplicity, we assume § = N. Let transition 
probabilities be p u = P(Vj = £\Vi-i = k), k,£eN for all i. 

We recall that the entropy rate of a point process V was defined in section 13.5 of DVJ88J according 
to the following: "Observation of process conveys information of two kinds: the actual number of points 
observed and the location of these points given their number." This led them to define the entropy of a 
realization {x\, ...,xn} as 

U(N) + U{x 1 ,...,x N \N) 

The entropy rate of the point process V is defined as follows: let N(T) be the number of points arrived in 
time interval (0, T] and the instances be x(T) = (xi, . . . , x./v(T))- Then, the entropy rate of the process is 

U ER {V) = km I [H(JV(T)) +H(x(T)|JV(T))] , 

1 — »oo 1 

if the above limit exists. 

We extend the above definition to the case of Markov chain in a natural fashion. Observation of a 
continuous time Markov chain over a time interval (0, T] conveys information of three types: the number of 
points/jumps of the chain in the interval, the location of the points given the number as well as the value 
of the chain after each jump. Treating each random variable as a mixed-pair allows us to consider all the 
random variables in a single vector. 

As before, let N(T) denote the number of points in an interval (0, T]. Let x(T) = {x\, Xjv(t))i V(T) = 
(Kb Vi>--) Vn(t)) denote the locations of the jumps as well as the values of the chain after the jumps. This 
leads us to define the entropy of the process during the interval (0, T] as 

(4.1) H (0iT ] =M(JV(T),V(T),x(T)). 

Observe that the (JV(T),V(T),x(T)) is a random vector of mixed-pair variables. 

For a single state Markov chain the above entropy is the same as that of the point process determine the 
jump/transition times. Similar to the development for point processes, we define the entropy rate of the 
Markov chain as 

Web = lim — % if it exists. 

T-,oo T 

Proposition 4.1. Consider a Markov chain with underlying Point process being Poisson of rate A, its 
stationary distribution being it = (7r(i)) with transition probability matrix P = [py]. Then, its entropy rate 
is well-defined and 

H J5il = A(l-logA)+AHMc, 

where H M c = - J2i J2j Pi] log Pi j ■ 

Proof. For Markov Chain as described in the statement of proposition, we wish to establish that 

lim -^r- = m EH, 

J — >oo 1 

as defined above. Now 

H (0lT ] =H(x(T),7V(T),V(T)) 

= U(x(T),N(T)) + M(V(T)\N(T),x(Tj). 



Consider the term on the right hand side of the above equality. This corresponds to the points of a Poisson 
process of rate A. It is well-known (cf. equation (13.5.10), pg. 565 DVJ88:) that 

(4.2) lim iH(x(T), JV(T)) = A(l - log A). 

1 —*oo 1 

Now consider the term H(V(T)|x(T), N(T)). Since V(T) is independent of x(T), we get from the defini- 
tion of conditional entropy that 

(4.3) H(V(T)|x(T), N(T)) = U(V(T)\N(T)). 
One can evaluate M(V(T)\N(T)) as follows, 

U(V(T)\N(T)) = 5> fe H(V , . • ■ , V k ), 

k 

where pk is the probability that N(T) = k. The sequence of states Vq, . . . , Vk can be thought of as sequence 
of states of a discrete time Markov chain with transition matrix P. For a Markov chain, with stationary 
distribution n (i.e. Pit = 7r), it is well-known that 

lim jM(V , . . . , V k ) = - V n(i) Vp, 3 logp^ 

— *00 fx, 

i 3 

= Hmc- 

Thus, for any e > 0, there exists k(e) large enough such that for k > fc(e) 

1 



-H(Vb, . . . , V k ) - H MC 



< e. 



For T large enough, using tail-probability estimates of Poisson variable it can be shown that 

/ yr 

¥(N(T) < k(e)) < exp — — 
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Putting these together, we obtain that for given e there exists T(e) large enough such that for T > T(e) 

m(v(t)\n(t)) _ i m(y ,...,v k ) \ 



^ ( kpk 

\ k 



T T 

Ek>k(e)kPk(H MC ±e)+0(k(e)) 



(Hmc i e) 



T 

XT + 0(k(e)) 



That is 



T 

AHmc i 2e. 



H(v ( r)|MD) 



Combining (|4.2p . (|4.3[) and the above equation we complete the proof of the Proposition 14. II □ 



Fact 4.1 (cf. Ch. 13.5 [DVJ88] ) . Consider the set of stationary ergodic point processes with mean rate 
A. Then the entropy of this collection is maximized by a Poisson Process with rate A. That is, if V is a 
stationary ergodic point process with rate A then 



W E r(P) < A(l-logA). 

9 



5. Application 



5.1. Computation of continuous entropies. In this section wc show how our previous results aid in the 
computation of traditional continuous time entropies. Let (Xx,X 2 ) be two i.i.d. random variables whose 
distributions satisfy the conditions required of it to be a good- mixed pair. Let (Yi,Y 2 ), Y\ < Y 2 be the 
ordering of (Xi,X 2 ). Then the following holds: 

Lemma 5.1. h(Y 1 ,Y 2 ) = h(X u X 2 ) - log 2. 

Proof. Let I represent the indicator function such that 1 = implies X\ = Y\, and 1=1 implies X\ = Y 2 . 
Clearly I is independent of Y\,Y 2 and probability of I = 1 is |. It is further easy to see that (I, Yy, Y2) «-> 
(Xi, X2) with the corresponding Jacobians evaluating to 1. Thus, viewed as mixed-pairs we can equate the 
entropies, yielding 

log2 + h(Y 1 ,Y 2 )=h(X 1 ,X 2 ). 

This can be of course be shown using traditional methods but the proof here is an illustration of how our 
results can be used to obtain such results in an easier fashion. □ 

In a similar fashion if (Yi, ...,Y n ) is an ordering, in increasing order, of the i.i.d. random variables 
(X 1 , ...,X n ), then 

h(Yi, .., Y n ) = h(X u ....,X n ) - log(n!) 

5.2. Poisson Splitting via Entropy Preservation. In this section, we use the sufficient conditions de- 
veloped in Lemma l3.2l to obtain proof of the following property. 

Lemma 5.2. Consider a Poisson process, V , of rate A. Split the process V into two baby-processes V\ and 
V 2 as follows: for each point of V , toss an independent coin of bias p. Assign the point to V\ if coin turns 
up head, else assign it to V 2 . Then, the baby-processes Vi and V 2 have the same entropy rate as Poisson 
processes of rates Xp and A(l — p) respectively. 

Proof. Consider a Poisson Process, V, of rate A in the interval [0, T]. Let N(T) be the number of points 
in this interval and let a(T) = {01, ajv(T)} be their locations. Further, let C(T) = {C\, CWt)} 
be the outcomes of the coin-tosses and M(T) denote the number of heads among them. Denote r(T) = 
{Ri, Rm(T)}> b(T) = B N ( T )-M{T)} a s the locations of the baby-processes Vi,V 2 respectively. 

It is easy to see that the following bijection holds: 

{z(T),C(T),N(T),M(T)}^ 
( ' ' {r(T),h{T),N{T)-M{T),M{T).} 

Given the outcomes of the coin-tosses C(T), {r(T),b(T)} is a permutation of a(T). Hence, the Jacobian 
corresponding to any realization of {C(T), N(T), M(T)} that maps a(T) to {r(T),b(T)} is a permutation 
matrix, i.e determinant is ±1. 

Therefore, Lemma 13.21 implies that 

H(a(T), C(T), N(T), M(T)) 

(5.2) = H(r(T), b(T), N(T) - M(T), M(T)) 

< H(b(T), N(T) - M{T)) + M(r(T), M (T)). 

M(T) is completely determined by C(T) and it is easy to deduce from the definitions that 

H(M(T)|a(T),C(T),JV(T)) = 0. 

Hence 

H(a(T), C(T), N(T), M(T)) 

(5.3) = H(a(T), C(T), N(T)) + M(M(T)\a(T), C(T),N(Tj) 

= H(a(T),C(T),7V(T)). 
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Since the outcome of the coin-tosses along with their locations form a continuous time Markov chain, using 
Proposition 14. II we can see that 

lim ±W{ a (T),C(T),N(T),M(T)) 

1 — ►OO 1 

(5.4) = r lf x ^H(a(T), C(T), N(T)) 

= A(l - log A) - A(plogp + (1 - p) log(l - p)) 

= Ap(l - log Ap) + A(l - p)(l - log A(l - P )). 

It is well known that V\,V2 are stationary ergodic processes of rates Ap, A(l — p) respectively. Hence from 
Fact 14.11 wc have 



lim ^H(r(T), M(T)) < Xp(l - logAp), 

(5.5) 

lim ^H(b(T), iV(T) - M(T)) < A(l - p)(l - log A(l - p)). 

Combining equations (|5.2[) . (|5.4[) , (|5.5[) we can obtain 

lim ^H(r(T), M(T)) = Ap(l - logAp), 
(5.6) T ^°° 

Hm — H(b(T), 7V(T) - M(T)) = A(l - p)(l - log A(l - p)). 

T— *oo i 

Thus, the entropy rates of processes V\ and P2 are the same as that of Poisson processes of rates Ap and 
A(l — p) respectively. This completes the proof of Lemma I5T121 □ 

6. Conclusions 

This paper deals with notions of entropy for random variables that are mixed-pair, i.e. pair of discrete 
and continuous random variables. Our definition of entropy is a natural extension of the known discrete 
and differential entropy. Situations where both continuous and discrete variables arise are common in the 
analysis of randomized algorithms that are often employed in networks of queues, load balancing systems, 
etc. We hope that the techniques developed here will be very useful for the analysis of such systems and for 
computing entropy rates for the processes encountered in these systems. 
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Appendix 

6.1. Proof of Lemma 12.11 We wish to establish that the conditions of Lemma \2. II guarantee that 
(6-1) f 9i(y)\ loggi(y)\dy < 00. 

i 

Let (a)+ = max(a,0) and (a)_ = min(a,0) for a G R. Then, 

a = a + + a_, and \a\ = a + — a_. 

By definition gi(y) > 0. Observe that 

I log5i(z/)l = 2(loggi(y)) + - log^y). 
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Therefore to guarantee (|6.ip it suffices to show the following two conditions: 

(6-2) V / g l {y){\ogg l {y))+dy < oo, 

.• Jr 



(6.3) 



E 



9i(y) \°&9i{y) d y 



< oo. 



The next two lemmas show that equations (|6.2p and (|6.3[) are satisfied and hence completes the proof of 
Lemma 12.11 

Lemma 6.1. Let Y be a continuous random variable with a density function g(y) such that for some 5 > 

g{y) 1+s dy < oo. 

Further if g(y) can be written as sum of non-negative functions gi(y), the 

E / 9i(.y)^oggi{y)) + dy < oo. 
JR 



Proof. For given 5, there exists finite Bg > 1 such that for x > Bs, logs < x s . Using this, we obtain 

9i(y)0-oggi(y))+dy = / &(y) \oggi(y)dy 
J 9i(y)>i 



(6.4) 



g t {y) log g r {y)dy + / g l {y) \ogg t {y)dy 

i<g t (y)<B 5 Jb s < 9i ( v ) 



<\ogB 5 / 9l {y)dy+ / 9l (y) 1+S dy 
Jr Jr 

= Pl \ogB s + [ g t (y) 1+s dy. 

Jr 

Therefore, 

£ f gi(y)(logg i (y)) + dy<y2[PilogB s + f 9l (y) 1+s dy 
i Jr i \ Jr 

^lo g B s + f Y^9i(y) 1+S dy 

JR • 

IP) f 

< \ogB s + / g(y) 1+s dy < oo. 

Jr 

In (a) we use the fact that gi (y) is positive to interchange the sum and the integral. In (b) , we again use the 
fact that g t (y) > to bound Ei9i(y) 1+S with (£, <? 4 (y)) 1+<5 . 

□ 

Lemma 6.2. In addition to the hypothesis of Y in Lemma \6.1\ assume that Y has a finite e moment for 
some e > 0. Then the following holds: 



E 



9i{y) log 9i(y)dy 



< oo. 



Proof. Let for some e > 0, 



M f 



\y\ e g(y) dy < oo. 



Note that for any e > 0, there is a constant C c > 0, such that / K C c e = 1. Further, observe that 

the density §i(y) = gi(y)/Pi is absolutely continuous w.r.t. the density f(y)(= C € e~^'). Thus from the fact 
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that the Kullback-Liebler distance D(gi\\f) is non-negative we have 

0< [g i (y)] 0g *yLdv= I 9i {y)\0g 9 / V \, H dy 



' Pif(y) Jn* 1 *' 1 "'" PiC f e 

9i(y)loggi(y) dy -p l logp l -p l \ogC c + \ \y\ e gi(y)dy. 

Therefore 
(6.5) 

From (|6.4p we have 

(6.6) / gi{y) log g^y) dy < I g,{y) (log g t (y))+dy < p l \ogB s + I g t (y) 1+s dy. 



g l (y)logg l (y) dy < -p t log^ +pi\ logC e | + / \y\ e g l (y)dy. 



Combining equations (|6.5p and (|6.6p . we obtain 
gi(y) log g l (y) dy 



(6.7) 



< -Pilogpi +pi\logC e \ + / \y\ e gi(y) +Pi log B s 
l (y) 1+S dy. 



Now using the facts 



- Pi log Pi < oo, V / \y\ ( g l (y)dy = / \y\ e g(y)dy 

and J2j R ^(y) 1+5 dy < j g(v) l+5 dy < °°> 



< CO, 



we obtain from (|6.7p that 



E 



ffi(y) \°g,9r(y)dy 



< oo. 
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